

### 香港中文大學 The Chinese University of Hong Kong

# CSCI2510 Computer Organization Lecture 09: Basic Processing Unit



# Basic Functional Units of a Computer 👢





- Input: accepts coded information from human operators.
- Memory: stores the received information for later use.
- Processor: executes the instructions of a program stored in the memory.
- Output: reacts to the outside world.
- Control: coordinates all these actions.

## **Outline**



- Main Components of a Processor
- RISC-Style Processor Design
  - Five-Stage Organization
  - Instruction Execution
- CISC-Style Processor Design
  - Multi-Bus Interconnect
  - Instruction Execution
- Control Signal Generation



## **Basic Processing Unit: Processor**



- Executing machine-language instructions.
- Coordinating other units in a computer system.
- Used to be called the central processing unit (CPU).
  - The term "central" is no longer appropriate today.
  - Today's computers often include several processing units.

• E.g., multi-core processor, graphic processing unit (GPU), etc.



# Main Components of a Processor



# Control Circuitry

Interpret/decode the fetched instruction & issue control signals to coordinate all the other units

Register File (i.e.,  $R_0 \sim R_{n-1}$ )

Served as the processor's general-purpose registers

Arithmetic and Logic Unit (ALU)

Perform arithmetic or logic operations

#### **Instruction Address Generator**

### **Program Counter (PC)**

Keep the address of the next instruction to be fetched and executed (special register)

Update the contents of PC after every instruction is fetched

#### **Instruction Register (IR)**

Hold the instruction until its execution is completed (special register)

### **Other Special Registers**

E.g., memory address register, memory data register, condition code register, stack pointer register, link register, etc.

#### **Processor-Memory Interface**

Allow communication between processor and memory through two special registers: memory address register (MAR) and memory data register (MDR)

## **Recall: Processor-Memory**



- Data transferring takes place through MAR and MDR.
  - MAR: Memory Address Register
  - MDR: Memory Data Register



\*MFC (Memory Function Completed): Indicating the requested operation has been completed.

# Recall: RISC vs. CISC Styles



- There are two fundamentally different approaches in the design of instruction sets for modern computers:
  - 1) Reduced Instruction Set Computer (RISC) reduces the complexity/types of instructions for higher performance.
    - Each instruction fits in a single word in memory.
    - A load/store architecture is adopted.
      - Memory operands are accessed only using Load/Store instructions.
      - The operands involved in arithmetic/logic operations must be either in registers or given explicitly within the instruction.
  - 2) Complex Instruction Set Computer (CISC) allows more complicated but powerful instructions to be designed.
    - Each instruction may span more than one word in memory.
    - The operands involved in arithmetic/logic operations can be in both registers and memory or given explicitly within the instruction.
    - Two-operand format is usually used.

## **Outline**



- Main Components of a Processor
- RISC-Style Processor Design
  - Five-Stage Organization
  - Instruction Execution
- CISC-Style Processor Design
  - Multi-Bus Interconnect
  - Instruction Execution
- Control Signal Generation

# Five-Stage Organization (RISC CPU)



- The execution of RISC instructions can be generally organized into a five-stage sequence of actions:
  - ① Fetch an instruction and increment the PC.
  - ② Decode the instruction & read source registers.
  - ③ Perform an ALU operation.
  - Read or write memory data if memory operand is involved.
  - Write the result into the destination register.

Note: Not all these actions have to be carried out by every instruction.



CSCI2510 Lec09: Basic Processing Unit 2023-24 T1

# Multi-Stage Structure: Datapath



- The processor's hardware can also be organized into multiple stages.
  - The actions taken in each stage can be completed independently and in one clock cycle (hopefully).
- It is necessary to insert inter-stage registers to:
  - Hold the produced results;
  - Work as inputs to the next.
- This multi-stage structure is often called datapath.



## **Outline**



- Main Components of a Processor
- RISC-Style Processor Design
  - Five-Stage Organization
  - Instruction Execution
- CISC-Style Processor Design
  - Multi-Bus Interconnect
  - Instruction Execution
- Control Signal Generation

# Fetch Phase: Stage ①



- The fetch phase includes two major actions:
  - 1) IR ← [[PC]]
    - Load the memory contents pointed to by PC into IR.
    - The MuxMA is set to select the address from PC.
    - IR\_enable must be set.
  - 2) PC ← [PC] + 4
    - PC enable must be set.





# **Execution Phase: Stage ②**



- The execution phase is generally of four stages:
  - ② Decode the instruction & read source registers.
    - [IR] must be firstly decoded by Control Circuity.
      - It is for generating signals to control all the hardware.
    - Two source registers can be read from Register File at the same time.
      - How? The source register addresses are <u>supplied by</u>
         IR directly (w/o decoding).
      - Two source registers are always read and placed into RA and RB (no matter whether they are needed).



# Instruction Encoding



- Consider a RISC-style processor that
  - Has 32 general-purpose registers;
  - Represents every instruction by a 32-bit word.
- Representative encoding formats include:
  - **①** Three-Operand Format
    - E.g., Add, Rdst, Rsrc1, Rsrc2

| 31  | 27        | 26  | 22 | 21 | 17 | 16 |        | 0 |
|-----|-----------|-----|----|----|----|----|--------|---|
| Rsr | <b>c1</b> | Rsr | c2 | Rd | st |    | OP cod | e |

- ② Immediate-Operand Format
  - E.g., Add Rdst, Rsrc, #Value
  - E.g., Load/Store instruction using register indirect or index modes
  - E.g., **Branch** instruction using offset
- 3 Address-Operand Format
  - E.g., **Branch** instruction
  - E.g., **Call** instruction



# **Execution Phase: Stage 3**



- The execution phase is generally of four stages:
  - ③ Perform ALU operation.
    - The input InA of ALU is supplied by RA.
    - The input InB of ALU is supplied by the multiplexer MuxB which forwards:
      - Either the contents of RB;
      - Or the immediate value specified in IR.
    - **ALU** performs the operation.
    - The computed result is placed in RZ.
    - Note: [RB] will also be forwarded to RM (though it's only needed by Store).



# **Execution Phase: Stage 4**



- The execution phase is generally of four stages:
  - Read/write memory data.
    - The memory read/write takes place via <u>Processor-Mem</u> Interface.
      - The effective address is derived by ALU and kept in RZ in Stage ③.
      - The "loaded" data are put into RY (with the multiplexer MuxY properly set).
      - The "to-be-stored" data are available in RM.
    - Note: For non-Load and non-Store instructions, the data in RZ are forwarded to RY.



# **Execution Phase: Stage** ⑤



- The execution phase is generally of four stages:
  - Write the result into the destination register.
    - The data kept in RY, which can be:
      - Either the result computed by ALU in Stage ③ and forwarded to RY in Stage ④;
      - Or the data loaded from the memory in Stage ④.

are written into **Register File** if needed.

- The dest. reg. address is from IR but is <u>determined</u> by Control Circuitry.
- **RF\_write** must be set.



### **Observations**



- The datapath is designed to be independent, multifunction, and general.
- But not all actions/stages have to be carried out by every instructions.
- Let's examine the actual execution of the following typical instructions:
  - Add R3, R4, R5
  - Load <u>R5</u>, X(R7)
  - Store R6, X(R8)
  - Branch



CSCI2510 Lec09: Basic Processing Unit 2023-24 T1

## **Recall: Register Transfer Notation**



- Register Transfer Notation (RTN) describes the <u>data</u> <u>transfer</u> from one <u>location</u> in computer to another.
  - Possible locations: memory locations, processor registers.
    - Locations can be identified symbolically with names (e.g., LOC).

### Ex.

- Transferring the contents of memory LOC into register R2.
- ① Contents of any location: denoted by placing square brackets [] around its location name (e.g. [LOC]).
- ② Right-hand side of RTN: always denotes a value
- 3 Left-hand side of RTN: the name of a location where the value is to be placed (by overwriting the old contents)

# Ex 1: Add R3, R4, R5

22 21 17 16 27 26 Rsrc1 Rsrc2 Rdst OP code

- $MAR \leftarrow [PC]$ , Read memory, Wait MFC, IR  $\leftarrow$  [MDR],  $PC \leftarrow [PC] + 4 \text{ (shown here)}$
- 2 Decode instruction,  $RA \leftarrow [R4], RB \leftarrow [R5]$ 
  - The source register addresses are available in  $IR_{31-27}$  and  $IR_{26-22}$ .
  - As a result, [R4] and [R5] can be read into RA and RB in Stage ②.
- $RZ \leftarrow [RA] + [RB]$ 3
  - MuxB is set to select input from RB.
  - ALU is set to perform an Add.
- $RY \leftarrow [RZ]$ 4

(5)

- MuxY is set to select input from RZ.
- $R3 \leftarrow [RY]$ • The dest. reg. address is in IR<sub>21-17</sub>.
  - RF\_write is set to allow writing R3.



## Class Exercise 9.1



- Assume R3, R4, and R5 originally hold the values 0,
   40, and 60, respectively.
- Considering Add R3, R4, R5, show the registers' contents "after" the completion of Stages ② to ⑤:
  - Note: Fill in "?" for those blanks that cannot be determined.

|     | RA | RB | RZ | RY | R3 |
|-----|----|----|----|----|----|
| 2   |    |    |    |    |    |
| 3   |    |    |    |    |    |
| 4   |    |    |    |    |    |
| (5) |    |    |    |    |    |

# Ex 2: Load <u>R5</u>, X(R7)

- 31 27 26 22 21

  Rsrc Rdst
- Immediate OP code

- ① MAR ← [PC], Read memory,
  Wait\_MFC, IR ← [MDR],
  PC ← [PC] + 4 (shown <u>here</u>)
- ② Decode instruction,
  RA ← [R7]
  - The src. reg. address is in IR<sub>31-27</sub>.
- $3 \text{ RZ} \leftarrow [\text{RA}] + X$ 
  - The immediate value X is from IR.
  - MuxB is set to select input from IR.
  - ALU is set to perform an Add.
- MAR ← [RZ], Read memory,
   Wait\_MFC, RY ← [MDR]
  - MuxMA is set to select input from RZ.
  - MuxY is set to select input from MDR.
- R5  $\leftarrow$  [RY]
  - The dest. reg. address is in IR<sub>26-22</sub>.
  - RF\_write is set to allow writing R5.



## Class Exercise 9.2



 Assume R5 and R7 originally hold the values 0 and 4, respectively, and the contents of main memory are shown as follows:

| 0x0 | 0x1 | 0x2 | 0x3 | 0x4 | 0x5 | 0x6 | 0x7 | 0x8 | 0x9 | ••• |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 9   | 8   | 7   | 6   | 5   | 4   | 3   | 2   | 1   | 0   | ••• |

- Considering Load R5, 4(R7), show the registers' contents "after" the completion of Stages ② to ⑤:
  - Note: Fill in "?" for those blanks that cannot be determined.

|     | RA | RB | RZ | RY | MAR | MDR | R5 | <b>R7</b> |
|-----|----|----|----|----|-----|-----|----|-----------|
| 2   |    |    |    |    |     |     |    |           |
| 3   |    |    |    |    |     |     |    |           |
| 4   |    |    |    |    |     |     |    |           |
| (5) |    |    |    |    |     |     |    |           |

# **Ex 3: Store R6, X(R8)**

- $MAR \leftarrow [PC]$ , Read memory, Wait MFC, IR  $\leftarrow$  [MDR],  $PC \leftarrow [PC] + 4 \text{ (shown here)}$
- 2 Decode instruction,  $RA \leftarrow [R8], RB \leftarrow [R6]$ 
  - The src. reg. address is in IR<sub>31-27</sub>.
  - The dest. reg. address is in IR<sub>26-22</sub>.
- $RZ \leftarrow [RA] + X, RM \leftarrow [RB]$ 3
  - The immediate value X is from IR.
  - [R6] is forwarded to ④ via RM.
- $MAR \leftarrow [RZ], MDR \leftarrow [RM],$ 4 Write memory, Wait MFC
  - MuxMA is set to select input from RZ.
  - The written data are forwarded from RM to MDR.
- No action



27 26

22 21

6 5

CSCI2510 Lec09: Basic Processing Unit 2023-24 T1

## Ex 4: Branch

- ① MAR ← [PC], Read memory,
  Wait\_MFC, IR ← [MDR],
  PC ← [PC] + 4 (shown here)
- ② Decode instruction
- ③ PC ← [PC] + branch offset
  - The branch offset is from IR.
  - MuxINC (in Instruction Address Generator) is set to select offset.
- 4 No action
- S No action





• Branch instructions typically use the address field to specify an offset from the current instruction to the branch target.

LABEL ADDR. OPCODE OPERAND

- Given the program, what is the required offset for Branch LOOP at the memory address 140?
  - Note: This program finds out the smallest number in a list.

| LABEL | ADDR. | OPCODE              | OPERAND     |           |  |  |
|-------|-------|---------------------|-------------|-----------|--|--|
|       | 100   | Move                | <u>R2</u> , | addr LIST |  |  |
|       | 104   | Clear               | R3          |           |  |  |
|       | 108   | Load                | <u>R4</u> , | N         |  |  |
|       | 112   | Load                | <u>R5</u> , | (R2)      |  |  |
| LOOP: | 116   | Subtract            | <u>R4</u> , | R4, #1    |  |  |
|       | 120   | Branch_if_[R4]=0    | DONE        | <u> </u>  |  |  |
|       | 124   | Add                 | <u>R3</u> , | R3, #4    |  |  |
|       | 128   | Load                | <u>R6</u> , | (R2,R3)   |  |  |
|       | 132   | Branch_if_[R6]≥[R5] | LOOF        |           |  |  |
|       | 136   | Move                | <u>R5</u> , | R6        |  |  |
|       | 140   | Branch              | LOOF        |           |  |  |
| DONE: | 144   | Store               | R5,         | RESULT    |  |  |

# Remark 1: Register Enabling



- Inter-stage registers

   (i.e., RA, RB, RZ, RM, &
   RY) are always enabled.
  - The results from one stage are always transferred to the next for simplicity.
- Other registers (e.g., PC, IR, and Register File) must not be changed in every stage.
  - They are enabled only at certain times (by Control Circuitry) via PC\_enable, IR enable and RF write.



CSCI2510 Lec09: Basic Processing Unit 2023-24 T1

# **Remark 2: Memory Function Completed**

- For stages involving Mem,
  - If data are in cache, the stage can be completed in one clock cycle.
  - If data are not in cache, the stage may take several clock cycles.
    - To handle such uncertainty:
      - Processor-Memory
         Interface generates the
         MFC signal upon the
         completion of a memory operation.
      - Control Circuitry checks the MFC signal during any stage involving memory to delay subsequent stage(s).



CSCI2510 Lec09: Basic Processing Unit 2023-24 T1

## **Outline**



- Main Components of a Processor
- RISC-Style Processor Design
  - Five-Stage Organization
  - Instruction Execution
- CISC-Style Processor Design
  - Multi-Bus Interconnect
  - Instruction Execution
- Control Signal Generation

# Interconnect (CISC CPU)



- CISC-style instructions require a different and more flexible organization of the processor hardware:
  - Interconnect provides interconnections among other units but does not prescribe any pattern of data flow.
  - Inter-stage registers are not needed, but it is still necessary to have some Temporary Registers.



# **Buses: Key to Transferring Data**



- It is typical to use buses to implement Interconnect.
  - A bus consists of a set of lines that enable data transferring from any one device to any other (connected to the bus).
- There may be multiple devices connected to the bus:
  - Only one can drive the bus at any given time.
  - More than one can receive data from the bus at the same t.
  - For this reason, switches (→, →, ) are often needed to allow data to be transferred into or transferred out from a device.



# **Three-Bus Implementation**



- It is common to interconnect the processor hardware via three buses:
  - Why 3? Typical instruction format!
  - Bus A and Bus B allow the data transfer of two source operands to ALU simultaneously.
  - Bus C allows transferring the result (computed by ALU) to the destination operand.
  - Note: Addresses for the three ports of Register File are generated by Control Circuity (not shown in the figure).



## **Outline**



- Main Components of a Processor
- RISC-Style Processor Design
  - Five-Stage Organization
  - Instruction Execution
- CISC-Style Processor Design
  - Multi-Bus Interconnect
  - Instruction Execution
- Control Signal Generation

# Instruction Fetching and Decoding



- All the instructions share the same actions of **fetching** and **decoding**:
- ① MAR ← [PC], Read memory,
  Wait\_MFC, IR ← [MDR],
  PC ← [PC] + 4
  - Bus B is used to send [PC] to CPU-Memory Interface to fetch the instruction.
  - The data read from the memory are sent to IR over Bus C.
  - Note: Wait\_MFC is needed since the data may be read from the main memory.
- ② Decode instruction
  - [IR] is decoded by **Control Circuity** to generate control signals (not shown).



# Ex 1: Add R5, R6



- ① MAR ← [PC], Read memory,
  Wait\_MFC, IR ← [MDR],
  PC ← [PC] + 4 (shown here)
- ② Decode instruction (shown <u>here</u>)
- $3 \text{ R5} \leftarrow [\text{R5}] + [\text{R6}]$ 
  - [R5] is read from Register File and transferred to InA of ALU via Bus A.
  - [R6] is read from Register File and transferred to InB of ALU via Bus B.
  - ALU is set to perform an Add.
  - The sum is stored back into R5 via Bus C.
  - Note 1: Reading source registers cannot proceed in parallel with the decoding, since CISC-style instructions do not always use the same fields to specify reg. addresses.
  - Note 2: Control Circuitry must carefully coordinate how to read and write Register File within the same step.



CSCI2510 Lec09: Basic Processing Unit 2023-24 T1

# Ex 2: And X(R7), R9



 Assume the index offset is a 32-bit value given as the second word of the instruction.



- ① MAR ← [PC], Read memory,
  Wait\_MFC, IR ← [MDR],
  PC ← [PC] + 4 (shown here)
- ② Decode instruction (shown <a href="here">here</a>)
- ③ MAR ← [PC], Read memory,
  Wait\_MFC, Tmp1 ← [MDR],
  PC ← [PC] + 4
  - The second word (i.e., X) is fetched from memory into the temporary register Tmp1.



CSCI2510 Lec09: Basic Processing Unit 2023-24 T1

# Ex 2: And X(R7), R9 (Cont'd)



- 4 Tmp2  $\leftarrow$  [Tmp1] + [R7]
  - [Tmp1] and [R7] are sent to ALU over
     Buses A and B, and the effective address is placed into the temporary register Tmp2.
- ⑤ MAR ← [Tmp2], Read memory, Wait\_MFC, Tmp1 ← [MDR]
  - The contents of memory address X(R7) are read and placed into Tmp1.
- ⑥ Tmp1 ← [Tmp1] AND [R9]
  - The AND computation is performed, and the result is placed into Tmp1.
- ⑦ MAR ← [Tmp2], MDR ← [Tmp1],
  Write memory, Wait MFC
  - The result is stored into the memory at the address X(R7), which is "still" available in Tmp2 since the completion of ④.



## Class Exercise 9.4



- Consider the three-bus implementation of a CISCstyle processor design.
- How many times of memory accesses are involved in Add R5, R6 and And X(R7), R9, respectively?

## **Outline**



- Main Components of a Processor
- RISC-Style Processor Design
  - Five-Stage Organization
  - Instruction Execution
- CISC-Style Processor Design
  - Multi-Bus Interconnect
  - Instruction Execution
- Control Signal Generation

# **Recall: Control Signals**



- The processor's hardware is governed by control signals that determine:
  - What is the input data appearing at a Multiplexer's output?
  - What will be the operation performed by ALU?
  - Will the data be loaded into the selected register?
  - Etc.





# **Control Signal Generation**



There're two typical ways to generate control signals:

### 1) Hardwired Control:

- The combinational circuit is used to "hard code" the generation of control signals with logic gates.
  - The clock and counter specify the current step (e.g., T1).
- It is a "hardware approach" and can operate at high speed.



### 2) Microprogrammed Control:

- Control signals are specified by "micro-programs" in Control Store.
  - The micro-Program Counter (μPC) always points to the next control word (or μ-instruction).
- It is a "software approach" and can support a complex instruction set.



# **Summary**



- Main Components of a Processor
- RISC-Style Processor Design
  - Five-Stage Organization
  - Instruction Execution
- CISC-Style Processor Design
  - Multi-Bus Interconnect
  - Instruction Execution
- Control Signal Generation

